Ensemble Fuzzy Clustering using Cumulative Aggregation on Random Projections
نویسندگان
چکیده
Random projection is a popular method for dimensionality reduction due to its simplicity and efficiency. In the past few years, random projection and fuzzy c-means based cluster ensemble approaches have been developed for high dimensional data clustering. However, they require large amounts of space for storing a big affinity matrix, and incur large computation time while clustering in this affinity matrix. In this paper, we propose a new random projection, fuzzy c-means based cluster ensemble framework for high dimensional data. Our framework uses cumulative agreement to aggregate fuzzy partitions. Fuzzy partitions of random projections are ranked using external and internal cluster validity indices. The best partition in the ranked queue is the core (or base) partition. Remaining partitions then provide cumulative inputs to the core, thus arriving at a consensus best overall partition built from the ensemble. Experimental results with Gaussian mixture datasets and a variety of real datasets demonstrate that our approach outperforms three state-ofthe-art methods in terms of accuracy and space-time complexity. Our algorithm runs one to two orders of magnitude faster than other state-of-the-arts algorithms.
منابع مشابه
A new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble
An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The inp...
متن کاملFuzzy ensemble clustering based on random projections for DNA microarray data analysis
OBJECTIVE Two major problems related the unsupervised analysis of gene expression data are represented by the accuracy and reliability of the discovered clusters, and by the biological fact that the boundaries between classes of patients or classes of functionally related genes are sometimes not clearly defined. The main goal of this work consists in the exploration of new strategies and in the...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملA fuzzy logic-based clustering algorithm for network optimisation
Rate of occurrence of high-dimensional data is much higher and sad to relate, classical clustering techniques do not hold good for such high-dimensional networks of arbitrary shapes in the underwater wireless sensor network. This is mainly due to the fact that clustering techniques are highly parameterised. Data aggregation using traditional clustering techniques is a problem for high-dimension...
متن کاملLeveraging Ensemble Models in SAS Enterprise MinerTM
Ensemble models combine two or more models to enable a more robust prediction, classification, or variable selection. This paper describes three types of ensemble models: boosting, bagging, and model averaging. It discusses go-to methods, such as gradient boosting and random forest, and newer methods, such as rotational forest and fuzzy clustering. The examples section presents a quick setup th...
متن کامل